Name | Version | Summary | date |
llm-markdownify |
0.2.1 |
Convert PDFs, images to high-quality Markdown using Vision LLMs. |
2025-08-09 14:58:17 |
docstrange |
1.1.2 |
Extract and Convert PDF, Word, PowerPoint, Excel, images, URLs into multiple formats (Markdown, JSON, CSV, HTML) with intelligent content extraction and advanced OCR. |
2025-08-07 13:45:30 |
vietcombank-captcha |
0.1.0 |
Lightweight CAPTCHA predictor for Vietcombank using ONNX |
2025-08-06 04:43:20 |
aspose-total-net |
25.7.0 |
Aspose.Total for Python via .NET is a Document Processing python class library that allows developers to work with Microsoft Word®, Microsoft PowerPoint®, Microsoft Outlook®, OpenOffice®, & 3D file formats without needing Office Automation. |
2025-08-05 23:32:27 |
marker-pdf |
1.8.3 |
Convert documents to markdown with high speed and accuracy. |
2025-08-04 18:18:40 |
kreuzberg |
3.10.1 |
Document intelligence framework for Python - Extract text, metadata, and structured data from diverse file formats |
2025-07-31 11:54:20 |
huaweicloudsdkocr |
3.1.160 |
OCR |
2025-07-31 09:51:16 |
ai-resume-parser |
1.0.6 |
AI-powered resume parser with parallel processing for multiple file formats (PDF, DOCX, images, etc.) |
2025-07-29 23:13:04 |
document-data-extractor |
1.0.4 |
Best open-source document to markdown extractor for LLM training data. Convert PDF, Word, PowerPoint, Excel, images, URLs to clean markdown, JSON, HTML locally. Alternative to Unstructured, Docling, Marker, MarkItDown, MinerU, PaddleOCR, Tesseract |
2025-07-29 08:25:56 |
dedoc |
2.4 |
Extract content and logical tree structure from textual documents |
2025-07-28 09:47:38 |
cloudflare-peek |
0.1.0 |
A Python utility for scraping Cloudflare-protected websites using screenshot + OCR fallback |
2025-07-27 16:41:12 |
cleanit |
0.4.9 |
Subtitles extremely clean |
2025-07-26 19:02:05 |
llm-data-converter |
2.2.0 |
Best open-source document to markdown converter for LLM training data. Convert PDF, Word, PowerPoint, Excel, images, URLs to clean markdown, JSON, HTML locally. Alternative to Unstructured, Docling, Marker, MarkItDown, MinerU, PaddleOCR, Tesseract |
2025-07-25 13:32:07 |
nanonets-extractor |
0.1.4 |
A unified document extraction library supporting local CPU, GPU, and cloud processing |
2025-07-23 11:17:54 |
invoice-ocr-mcp |
1.0.4 |
企业发票OCR识别MCP服务器 - 基于ModelScope的专业发票识别解决方案 |
2025-07-17 07:23:13 |
mseep-kreuzberg |
3.8.2 |
Document intelligence framework for Python - Extract text, metadata, and structured data from diverse file formats |
2025-07-17 03:32:28 |
mpxpy |
0.0.17 |
Official Mathpix client for Python |
2025-07-16 19:30:22 |
django-ocr_translate |
0.6.3 |
Django app for OCR and translation |
2025-07-16 11:42:24 |
SpectrePDF |
0.2.1 |
A tool for processing and redacting PDFs based on target words using OCR. |
2025-07-14 17:57:14 |
docforge |
0.1.0 |
Forge perfect documents from any format with precision, power, and simplicity |
2025-07-13 22:29:47 |